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Abstract.  This  article  describes  a  set  function  that  maps  a  set  of  Pareto 
optimal  points  to  a  scalar.  A  proof  is  presented  that  shows  that  the  max¬ 
imization  of  this  scalar  value  constitutes  the  necessary  and  sufficient  con¬ 
dition  for  the  function’s  arguments  to  be  maximally  diverse  Pareto  opti¬ 
mal  solutions  of  a  discrete,  multi-objective,  optimization  problem.  This 
scalar  quantity,  a  hypervolume  based  on  a  Lebesgue  measure,  is  there¬ 
fore  the  best  metric  to  assess  the  quality  of  multiobjective  optimization 
algorithms.  Moreover,  it  can  be  used  as  the  objective  function  in  simu¬ 
lated  annealing  (SA)  to  induce  convergence  in  probability  to  the  Pareto 
optima.  An  efficient  algorithm  for  calculating  this  scalar  and  analysis  of 
its  complexity  is  presented. 


1  Introduction 

This  article  describes  a  measure  theoretic  approach  for  defining  a  set  func¬ 
tion  that  can  be  utilized  for  solving  multi- objective  optimization  problems 
(MOPs).  Zitzler  et  al.  introduced  the  foundation  for  this  set  function  in 
the  following  passage: 

In  the  two  dimensional  case  each  Pareto  optimal  solution  x  covers  an 
area,  a  rectangle,  defined  by  the  points  (0,0)  and  (/i(x),  /2(x)).  The 
union  of  all  rectangles  covered  by  the  Pareto  optimal  solutions  consti¬ 
tutes  the  space  totally  covered,  its  size  is  used  as  measure.  This  concept 
may  be  canonically  extended  to  multiple  dimensions  [1]. 

This  article  embellishes  this  notion  of  a  set-cover  measure  by  1)  extending  it 
to  an  arbitrary  number  of  dimensions,  2)  rigorously  proving  that  the  maximiza¬ 
tion  of  the  associated  set  function’s  scalar  output  is  the  necessary  and  sufficient 
condition  for  its  arguments  to  be  Pareto  optimal  solutions  to  a  multi-objective 

*  The  author  was  supported  in  part  by  the  Center  for  Satellite  and  Hybrid  Com¬ 
munications  Networks  in  the  Institute  for  Systems  Research  at  the  University  of 
Maryland,  College  Park,  and  through  collaborative  participation  in  the  Collabora¬ 
tive  Technology  Alliance  for  Communications  &  Networks  sponsored  by  the  U.S. 
Army  Research  Laboratory  under  Cooperative  Agreement  DAAD19-01-2-0011  and 
the  National  Aeronautics  and  Space  Administration  under  award  No.  NCC8-235. 
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optimization  problem,  and  finally,  3)  using  insights  from  this  proof  to  develop 
an  efficient  algorithm  for  computing  the  value  of  this  set  function.  Analysis  of 
the  algorithm’s  complexity  is  also  provided. 

As  the  reader  will  no  doubt  discover,  the  intuition  for  this  scalar  is  quite 
simple,  yet  its  first  appearance  was  surprisingly  quite  recent  (much  to  the  frus¬ 
tration  of  the  author)  [1-3] .  Although  in  the  two  dimensional  case  (two  objective 
functions)  proving  the  validity  of  this  measure  seems  almost  trivial,  for  an  arbi¬ 
trary  number  of  objective  functions  the  proof  seems  less  obvious.  Presenting  a 
formal  proof  therefore  serves  four  purposes: 

1.  it  establishes,  with  mathematical  rigor,  that  the  maximum  value  of  the  set 
function  is  a  necessary  and  sufficient  condition  for  the  Pareto  optimality  of 
the  function’s  arguments,  hence,  is  the  best 1  measure  for  evaluating  heuristics 
that  seek  to  find  Pareto  optima; 

2.  it  therefore  provides  a  sound  mathematical  basis  for  comparisons  to  other 
similar  measures  or  approximations; 

3.  the  proof  points  the  way  to  a  simple  approach  for  calculating  the  measure; 

4.  it  provides  a  mechanism  for  generalizing  any  optimization  metaheuristic  to 
handle  multiple  objectives. 

With  regard  to  Point  4,  this  scalar  can  be  used,  e.g.,  as  the  objective  function 
in  simulated  annealing  (SA).  Because  it  is  well-known  that  SA  converges  in 
probability  to  the  global  optima  ([6]),  using  this  set  function  as  the  objective 
function  in  SA  induces  SA  to  converge  in  probability  to  Pareto  optima!2 

This  article  is  organized  as  follows:  Section  2  provides  background  on  ap¬ 
proaches  for  solving  multi-objective  optimization  problems  and  recent  results  in 
the  literature.  This  includes  a  philosophical  discussion  of  the  issues  surrounding 
the  relative  merits  of  using  genetic  algorithms  (GAs)  versus  SA.  Although  some 
of  these  issues  will  be  further  explored  in  future  work,  this  discussion  provides 
motivation  for  what  is  to  follow.  Formal  definitions  of  Pareto  optimality  and 
other  mathematical  elements  are  described  in  Section  3.  Section  4  describes  the 
hypervolume  in  MOPs,  its  mathematical  characteristics  and  presents  the  main 
results.  Section  5  presents  an  efficient  algorithm  for  computing  this  scalar  and  an 
analysis  of  its  complexity.  Finally,  Section  6  discusses  issues  for  future  research 
and  provides  concluding  remarks. 

2  Background 

2.1  Considerations  of  GAs  vs.  SA 

Quite  a  few  multi-objective  algorithms  have  been  described  in  recent  years  often 
motivated  by  design  optimization  problems.  Most  of  these  approaches  have  been 

1  Regarding  Points  1  and  2,  practicalities  may  suggest  other  measures  with  more  utility 
depending  upon  the  nature  of  the  problem  and  the  algorithm  used  to  solve  it  (see 
e.g.,  [4,5]). 

2  This  is  the  only  mathematically  convergent  approach  to  Pareto  optima  this  author 
is  aware  of. 
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based  on  GAs  (see  e.g.,  [7,8])  although  some  have  been  based  on  SA  [9].  While 
the  level  of  research  into  multi-objective  GAs  seems  to  dominate  similar  research 
using  an  SA  approach,  SA  may  provide  some  untapped  potential  and  have  several 
advantages  over  GAs  for  solving  MOPs. 

One  clear  advantage  of  SA  is  its  mathematical  convergence  properties  de¬ 
scribed  earlier.  As  will  become  clear  later  on,  using  the  set  function  as  the  ob¬ 
jective  function  in  SA  forces  convergence  to  points  in  objective  function  space 
that  are  distinct.3  This  means  that  SA  will  converge  to  solutions  with  as  good 
a  ‘spread’  as  possible.  This  diversity  of  solutions  has  been  cited  in  a  number 
of  articles  as  an  indicator  of  good  performance  of  multi-objective  optimization 
algorithms  [2,4,8]. 

Notwithstanding  the  mathematical  convergence  of  SA,  GAs  offer  advantages 
in  problems  where  some  structure  exists  in  the  underlying  domain.  Indeed,  the 
very  elements  of  the  GAs,  in  particular  the  crossover  operator,  lend  themselves 
toward  propagating  those  features  in  a  chromosome  that  tend  to  be  associated 
with  high  fitness  values  [10].  These  particular  advantages  of  GAs  may  however 
be  diminished  when  compared  to  an  SA-based  approach  in  the  context  of  MOPs. 

Quite  often  in  MOPs  features  associated  with  the  underlying  structure  are 
masked  and  confounded  by  the  interplay  of  several  competing  objective  func¬ 
tions.  Pareto  optima  may  correspond  to  solutions  that  are  not  local  optima  of 
any  of  the  objective  functions — they  may  lie  on  the  sides  of  hills  rather  than 
at  their  tops  or  bottoms.  In  other  words,  the  pre-image  of  Pareto  optima  may 
be  scattered  throughout  the  domain  space  in  an  apparently  haphazard  or  ran¬ 
dom  manner  rendering  any  structure  within  it  to  be  of  little  or  no  consequence. 
As  such,  the  thermodynamic  approach  inherent  in  SA  may  be  more  suitable  for 
solving  MOPs  than  GAs. 

Before  an  elegant  SA  approach  for  solving  MOPs  is  a  realistic  possibility, 
however,  an  efficient  method  for  calculating  this  set  function  value  is  needed. 
To  date,  this  has  not  been  done  even  in  the  context  of  GAs  although  some 
GA  approaches  have  indirectly  utilized  this  notion  of  a  scalar.  Wu  et  al.  [4] 
quantify  a  hyperarea  difference  metric  closely  related  to  the  hypervolume  based 
on  the  Lebesgue  measure  of  the  set  of  dominated  points.  Fonseca,  et  al.  [5] 
describe  the  concept  of  the  “attainment  surface”  which  attempts  to  quantify 
how  different  chromosomes  contribute  to  a  performance  metric  linked  to  this 
scalar.  Other  methods  involve  archiving  or  updating  solutions  that  are  non- 
dominated  [11],  hence  indirectly  maximize  this  scalar.  All  of  these  methods,  in 
various  ways,  attempt  to  produce  solutions  that  ultimately  maximize  the  value 
of  this  scalar,  but  avoid  dealing  with  or  computing  it  directly  ostensibly  for 
various  reasons:  either  it  is  not  computationally  feasible,  it  is  too  difficult  given 
some  formulations,  it  can  be  well  approximated,  or  other  indirect  methods  are 
simpler  (in  some  sense). 

To  illustrate  why,  perhaps,  this  measure  has  not  been  used  directly  consider 
the  application  of  the  inclusion- exclusion  formula  described  in  [12]  (see  also 

3  This  depends  on  how  many  decision  variables  relative  to  Pareto  optima  are  used  in 
SA. 
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[4]).  For  sets  F\,  F%, . . . ,  Fn  (using  William’s  notation  here)  with  p  the  Lebesgue 
measure: 

h  (Ui<n  =  Si<n  M-Fi)  —  S  'Yhi<j<n  h{Fi  ^  ^i)  + 

M-Fi  n  Fi  n  Fk)  (1) 

- h  {-i)n~1n{F1  n  f2  n . . .  n  f„) 

successive  partial  sums  alternating  between  over-  and  under- estimates.  [12,  p.21]. 

This  rather  ugly  and  unwieldy  expression  (in  terms  of  computation)  is  due 
to  the  number  of  combinations  of  intersection  sets  the  measures  of  which  must 
be  added  and  subtracted  from  the  sum  of  the  unions.  Indeed,  Wu  et  al.  [13,  p. 
47]  show  a  closed  form  solution  for  their  hyperarea  difference  measure,  some¬ 
thing  directly  related  to  the  hypervolume  described  here,  that  accounts  for  just 
three  points  that  was  quite  “cumbersome”  [4,  see  e.g.,  Eq.(16)  on  p.  21].  Ac¬ 
counting  for  more  points  with  more  objective  functions  further  complicates  this 
approach.  See  also  [2-4]  for  related  works  on  quality  metrics.  Figure  2  illustrates 
the  potential  difficulties  in  computing  this  hypervolume  that  can  arise  from  the 
topological  complications  in  higher  dimensions.  It  also  provides  hints  for  the  ef¬ 
ficient  computation  of  this  scalar  inspired  by  the  proof  in  Section  4  and  the  area 
of  computational  geometry. 

2.2  Computational  Geometry 

Computation  of  the  hypervolume  has  a  similar  flavor  to  problems  in  compu¬ 
tational  geometry ,  a  relatively  new  area  of  computer  science  (see  e.g.,  [14]).  It 
turns  out  that  Pareto  optimal  points  constitute  a  maximal  set  of  points  (they 
have  identical  definitions).  In  the  field  of  computer  science,  many  algorithms  per¬ 
taining  to  maximal  sets  have  been  studied.  For  example,  articles  have  focused 
on  dynamically  maintaining  a  list  of  maximal  points  [15, 16].  Unfortunately,  the 
field  of  computational  geometry  has  focussed  on  problems  that  can  be  visualized 
or  rendered  on  a  computer  screen  [14,  p.2]. 

Notwithstanding  the  research  on  maximal  sets,  there  does  not  seem  to  be  any 
literature  from  the  computer  science  community  concerning  the  hypervolume  of 
space  covered  by  maximal  sets  even  in  the  basic  texts  cited  earlier.  This  could 
be  due  to  that  fact  that  the  problem  seems  too  easy  or  uninteresting,  or  there 
is  no  motivation,  or  possibly  that  researchers  assume  others  have  already  dealt 
with  these  problems  and  issues.  Despite  looking  for  some  time,  no  similar  results 
in  the  computer  science  literature  similar  to  the  ones  presented  here  have  been 
found.  This  seems  to  be  the  state-of-affairs  in  the  optimization  community  as 
well  except  for  those  references  cited  herein  that  refer  to  Zitzler,  et  al.  [1]. 

3  Mathematical  Preliminaries  and  Definitions 

Before  describing  the  set  function  in  detail,  a  mathematical  framework  is  neces¬ 
sary.  The  following  describes  the  basic  elements  of  this  framework  and  the  notion 
of  Pareto  optimality. 
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Let  X  C  Rd  be  a  finite  set  of  s  feasible  points  in  Rd  for  some  MOP  with 
objective  functions  fi,  i  =  {1, . . . ,  n}  where  for  each  i 

fi  :  Rd  — >  R1. 

Also,  let  Xp  C  X  be  the  set  of  Pareto  optima  (defined  below)  in  set  X  with  p  <  s 
the  number  of  Pareto  optima.  Thus  for  MOPs  with  n  objectives,  each  vector 
x  €  A  produces  n  (real)  objective  function  values  f(x)  =  |/i(x), . . . , /„(x)} 
corresponding  to  a  single  point  px  in  objective  function  space.  The  image  of 
set  X  is  therefore  a  finite  set  of  points  px  €  R"  and  denoted  by  S  with  s'  <  s 
elements.  Thus,  the  vector  valued  mapping  i(X)  =  S  is  onto  and  not  necessarily 
one-to-one. 

Definition  1.  4  A  point  px  =  (/i(x), . . . ,  /„(x))  is  Pareto  optimal  if  for  all 
feasible  points  p  =  (/i  , . . . ,  fn)  €  S  (this  means  the  corresponding  points  x  are 
feasible),  there  exists  an  i  such  that  /*(x)  <  fp  or  for  all  i,  fi(x)  <  fi  (this  latter 
case  again  refers  to  a  single  point  that  dominates  all  other  feasible  points). 

Definition  2.  A  point  px  <E  S  corresponding  to  solution  x  is  non-dominated 
with  respect  to  set  S  if  and  only  if  for  all  other  points  py  €  S  there  exists 
an  i  such  that  fi(x)  <  /)(y)  (assume  that  if  S  has  a  single  element  it  is  non- 
dominated  with  respect  to  set  S).  A  set  S  is  a  non-dominated  set  if  all  points 
p  <E  S  are  non-dominated  with  respect  to  set  S. 

4  The  Hypervolume 

The  goal  of  this  section  is  to  define  the  set  function  that  maps  a  subset  (or  the 
entire  set)  of  Pareto  optima  to  a  scalar.  Let  m  be  the  number  of  arguments  x£f 
of  a  set  function  F.  These  m  points  in  X  map  to  m  points  p  £  S'  in  objective 
function  space  which  must  then  map  to  a  single  scalar,  the  hypervolume  p. 
Equation  (2)  makes  this  mapping  clear: 


{/i(xi),/2(xi),.. 

•>/n(x  l)} 

=  Pi  ' 

{/i(x/),/2(x;),... 

•  ?  fn  (x; ) } 

=  Pi 

>  l— >  p. 

(2) 

{/l(xm),/2(xm), 

•  •  •  5  /n(Xm)} 

=  Pm  , 

4  This  definition  is  often  written  in  the  negative.  That  is,  a  solution  x  is  in  the  set  P  of 
Pareto  optimal  solutions  if  there  is  no  solution  y  such  that  /,(x)  >  fi(y)  and  there 
exists  an  i  where  /i(x)  >  /i(y).  Equivalently,  x  ^  P  if  By  €  ft,  where  V»,/i(x)  > 
fi( y)  f  \  3 i,  /i(x)  >  fi( y).  Thus,  the  negation  of  this  statement  is  used  above  to  define 
the  Pareto  optima  in  positive  terms. 

5  Without  loss  of  generality  and  to  make  the  definition  less  confusing,  minimizing 
objective  functions  are  assumed  here. 
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Consequently,  some  function  F  :  Rnxrra  — >  R1  must  be  defined.  This  is  made 
possible  by  generalizing  the  concept  of  optimality  using  a  measure  theoretic 
approach  and  extending  the  associated  measures  of  performance  to  the  multi¬ 
dimensional  case.  Zitzler  [1]  in  effect6  uses  the  interval  length  M  —  /(x)  between 
some  upper  bound  on  the  objective  function  values,  M,  and  the  objective  func¬ 
tion  value  as  a  measure  of  performance  for  a  given  objective  function. 7  This 
interval  captures  the  important  feature  of  a  performance  measure,  the  ability  to 
rank  solutions  according  to  their  desirability.  For  given  M,  the  larger  the  interval 
length,  the  smaller  the  objective  function  value,  hence,  the  better  the  solution. 

Generalizing  Zitzler’s  et  al.  [1]  notion  of  interval  as  a  set  measure  and  estab¬ 
lish  the  mappings  in  (2)  requires  the  following  formal  definitions.  For  n  objective 
functions,  a  solution  x  defines  the  following  dominance  set  and  measure. 

Definition  3.  Lebesgue  Measure  of  the  Deleted  Dominated  Set:  Let  p  = 

(/i,  fit  ■  ■  ■  i  fn )  represent  a  point  in  objective  function  space  where,  without  loss  of 
generality,  i  =  1, ...  ,j  are  indices  of  minimization  functions  and  i  =  j  + 1, . . . ,  n 
are  the  indices  of  maximization  functions.  Let  /j(x)  be  a  particular  value  of 
fi  produced  by  solution  x  where  Mi  and  mi  are  the  upper  and  lower  bounds 
for  minimization  and  maximization  objective  functions,  respectively.  Then  the 
deleted  dominated  set  Dx  =  {P  :  Vi,  fi  £  ([mi,/i(x)]  U  [/;  (x) ,  M;] ) /\  p  ^ 
Px}8  and  constitutes  a  set  of  points  p  strictly  inferior  to  px-  The  Lebesgue 

measure  of  this  set  is  p(Dx)  =  (j[3i=1[Mi  -  /»(x)])  (nr=3+i[/i(x)  “  mi\)  • 

The  following  lemmas  will  be  useful  in  proving  the  main  result. 

Lemma  1.  Given  a  finite  set  of  points  S  in  objective  function  space,  point  px  £ 

5  is  dominated  if  and  only  if  there  exists  a  py  £  S  such  that  px  £  Dy . 

Proof.  This  follows  directly  from  application  of  Definition  3. 

Corollary  to  Lemma  1:  Point  px  £  S  is  non- dominated  with  respect  to  S  if 
and  only  if  for  all  py  £  S,  px  c/L  Dy. 

Proof.  See  Appendix  A.l. 

With  several  points  Xi  . . .  xTO,  the  union  of  the  corresponding  deleted  domi¬ 
nated  sets  constitutes  the  set  of  dominated  points  defined  by  a  finite  number  of 

6  The  quote  in  Section  1  was  obviously  referring  to  a  maximization  problem.  Here  we 
extend  this  notion  by  defining  upper  bounds  on  minimizing  objective  functions. 

'  Stated  this  way  is  subtly  different  than  stating  that  this  measure  is  the  size  of  the 
set  cover  of  dominated  solutions  (which  it  of  course  is — see  [3]).  It  is  this  subtle 
difference  that  indicates  we  can  use  this  measure  as  the  objective  function  in  an 
optimization  algorithm  provided  an  efficient  algorithm  exists  to  calculate  its  value. 

8  For  convenient  notation,  we  assume  that  if  mi  <  fifx),  i.e.,  where  mi  is  a  lower 
bound  for  a  maximization  function  fi,  then  the  interval  [/j(x),  mi\  =  0  and  similarly 
for  the  case  where  M,  is  an  upper  bound  for  a  minimizing  function  /, .  This  notation 
effectively  deals  with  both  minimization  and  maximization  objectives. 
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Pareto  optima.  The  measure  of  this  set  therefore  is  a  measure  of  performance 
for  MOPs.  The  following  definition  makes  this  clear.9 

Definition  4.  Let  Sm  =  {pi, . . . , pm}  C  S,  a  set  of  m  feasible  points  in  objec¬ 
tive  function  space.  Then  the  dominance  set  D$m  is  the  union  of  the  dominance 
sets  of  each  element  of  set  Sm.  That  is,  DBm  =  UI=i  DPi  and  the  measure  of 
this  set  is  p{DSm)  =  p  (U™  1  Dm)- 

Lemma  2.  Dominance  Calculus:  From  the  previous  definitions  the  following 
statements  are  true  for  any  point  sets  A  and  B, 

D a  |^J  Db  =  Daub 
If  An  B^  0,  then  Da  f~'|  DB  =  Daub- 


Proof.  See  A. 2. 

The  following  definitions  and  lemmas  show  important  relationships  among 
points  in  a  set  S  and  bounds  on  objective  function  values  and  will  be  used  to 
prove  the  main  result  in  Theorem  1.  For  notational  simplicity  and  without  loss 
of  generality,  we  shall  assume  all  objective  functions  are  to  be  minimized.  The 
following  definitions  are  needed: 

Definition  5. 

F;  =  {mi,  /,(xi),  /j(x 2), . .  • ,  /i(xTO),  Mi},  the  set  of  the  ith  objective  function  val¬ 
ues  among  elements  in  set  S  and  their  lower  and  upper  bounds. 

Uj(Px),  the  least  upper  bound  of  fifx)  in  set  F^. 
li (Px) ,  the  greatest  lower  bound  of  /j(x)  in  set  Fj. 

D'x  =  {(fu  f2,  ■  ■  ■ ,  fm)  ■  Vi,/i(x)  <  fi  <  Uj(px)},  the  set  of  points  in  a  mini¬ 
mization  problem  exclusive  to  set  Dx.  See  Lemma  3  below. 

For  example,  given  pi  =  (1,  3, 2)  p2  =  (4, 1,  6)  pa  =  (4, 5, 1)  with  rrij  =  0  and 
Mi  =  7  for  all  i,  then  F2  =  {0, 3, 1, 5,  7},  u2(pi)  =  5,  w2(p2)  =  3  and 

D\ p2)  =  {(/1,  /2,  /3)  :  4  <  fx  <  7,  1  <  h  <  3,  6  <  f3  <  7}.  (3) 

Figure  1  illustrates  set  D'p x  for  the  two-dimensional  case  and  the  relationships 
of  the  definitions  above.  These  will  help  to  clarify  elements  of  the  proof.  Notice 
that  the  shaded  area  indicated  by  hash  marks  associated  with  px  shows  a  set  of 
points  exclusive  to  DPx  that  add  to  the  measure  of  set  Sm. 

The  following  lemma  is  a  key  element  in  proving  the  main  result  and  provides 
the  basic  idea  behind  the  algorithm  described  in  Section  5. 

9  Note  that  px  =  (/i(x),  /2(x), . . . ,  /m(x))  and  will  often  be  denoted  using  the  simpler 
notation  (/1,  /2, . . . ,  fm)  and  p  where  it  is  sufficiently  clear  that  we  mean  (/i(x)  . . .) 
and  px.  Also,  depending  on  the  context,  the  dominance  set  associated  with  some 
point  px  or  p i  will  be  denoted  as  DPx  =  Dx  and  DPi  =  Di,  respectively. 
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Fig.  1.  Relationships  of  Points,  D'x  and  Upper  bounds 


Lemma  3.  For  every  non-dominated  point  px  £  S  there  exists  a  set  of  points 
D'x  C  Dx  such  that  for  all  p'  €  D'x,  p'  is  non-dominated  with  respect  to  set 
S  \  px  and  such  that  for  all  py  €  S,  D'x  p|  Dy  =  0. 

Proof.  See  A. 3. 

Corollary  to  Lemma  3:  For  all  sets  D’x  C  D$,  p{Dx)  >  0. 

Proof.  See  A. 3. 

Lemma  4.  Given  points  px,  py  £  S,  px  dominates  py  if  and  only  if  Dy  C  Dx. 
Proof.  See  Appendix  A. 4. 

Corollary  to  Lemma  4:  If  px  dominates  py  then  //,  ( Dx  U  l)y)  =  /( (_DX)  > 
ll  (Dy). 

Proof.  See  Appendix  A. 4. 

Theorem  1  shows  that  the  measure  of  set  Sm  achieves  its  maximum  value  if 
and  only  if  points  p  £  Sm  are  Pareto  optimal. 

Theorem  1.  Given  set  Sm  of  m  points  in  objective  function  space  in  an  MOP 
with  p  Pareto  optimal  solutions,  let  Mi  be  the  given  bounds  for  fi  (for  the  sake  of 
clarity  and  without  loss  of  generality,  we  assume  each  objective  function  is  to  be 
minimized  and  the  Mi  are  therefore  upper  bounds  on  fi).  Let  F(xi,  X2, . . . , xm)  = 
n(Dsm)  be  a  set  function  mapping  a  subset  of  points  Sm  C  S  to  the  Lebesgue 
measure  of  the  dominance  set.  Then  the  following  are  true: 

Case  1  (to  <  p) :  If  F  is  at  its  maximum  value  then  all  m  points  in  Sm  are 
Pareto  optimal  and  for  all  pfc,  p i  £  Sm,  k  ^  l  =>  pfc  ^  p;. 

Case  2  (to  >  p):  F  is  at  its  maximum  value  if  and  only  if  there  is  a 
subset  S'  C  Sm  of  size  p  such  that  all  p  £  S'  are  Pareto  optimal  and  for  all 
Pfc,  pi  £  S',  k^l^  pfc  ^  p  /. 


Proof.  See  B.l. 
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5  Calculating  the  Hypervolume 

As  noted  earlier,  calculating  the  hypervolume  based  on  the  inclusion-exclusion 
formula  can  be  quite  messy.  The  basic  idea  behind  the  approach  describe  here 
stems  directly  from  Lemma  3,  its  corollaries,  and  Theorem  1:  the  algorithm  suc¬ 
cessively  lops  off  hybercubes  containing  points  in  sets  D'Px  and  adds  its  volume 
to  a  partial  sum.  This  ‘lopping  off’  procedure  continues  until  there  is  nothing 
left  with  positive  measure. 

The  algorithm  works  by  storing  the  original  set  of  points  S  in  a  list  L.  The 
lopped  off  hypercube  is  ‘removed’  from  L  by  the  computation  of  new  points 
created  by  its  ‘removal’  using  the  SpawnData  procedure.  The  ‘spawned’  points 
that  are  nondominated  with  respect  to  the  remaining  points  in  L  are  added  to 
L.  The  size  of  L  therefore  grows  and  shrinks  as  this  process  continues  inevitably 
halting  when  the  last  vector’s  volume  is  added  to  the  partial  sum. 

This  procedure  avoids  the  necessity  of  dealing  with  intersection  sets  and 
works  for  an  arbitrary  number  of  objective  functions  and  points.  In  the  following 
pseudocode,  two  data  structures,  List  and  SpawnData ,  hold  the  original  and 
spawned  vectors  respectively,  and  Size  equals  the  number  of  vectors  in  List. 

The  LebMeasure  Algorithm 

Initialize:  LebMeasure  =  0.0; 

newSize  =  Size; 
while(newSize  >  1)  do:  { 
lopOffVol  :=  1.0; 
get  first  vector  pi  in  List 
for (i  =  0;  i  <  n;  i  ++)  do  { 

bi  :  =  getBoundValue(/i(xi))  =  {tti(pi),  b(pi)} 

spawn Vector(pi,  j,  6i);  //Add  spawned  vectors  to  SpawnData 

lopOffVol  *=  |/i(xi)  -  bi\; 

} 

LebMeasure  +  =  lopOffVol ; 

delete  pi  from  List 

newSize  =  ndFilter(List,  SpawnData); 

//Check  if  and  List  vectors  dominate  SpawnData  vectors 
clear  SpawnData 
}  end  of  while  loop. 
lastVol  =  1.0; 

for(i  =  0;  i  <  numMops;  i  ++){ 

lastVol  x  =  |/i(xi)  —  {mi, Mi}|; 

} 

return(LebMeasure) ; 

getBoundValue(/i(xi)):  This  routine  compares  /j(x i)  to  the  corresponding 
elements  in  vectors  2  through  Size  and  returns  either  'u,(p  i  j  or  /i(pi)  depending 
on  whether  /)  is  to  be  minimized  or  maximized,  respectively  and  assigns  this 
value  to  bi  (note  the  notation  in  the  pseudocode).  Its  time  complexity  is  therefore 
(L  —  l)n  where  L  is  the  number  of  original  vectors  in  List.  Thus,  in  the  first 
while  loop,  the  time  complexity  is  (m  —  1  )n. 
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spawnVector(pi,  i,  bi):  This  routine  creates  the  following  n  vectors  (is  exe¬ 
cuted  n  times)  based  on  the  removal  of  pi  from  List  (based  on  maximizing 
objectives):10 


Pll  =  {Zl(Pl),  /2,  •  •  •  ,  fn}  =  {bl,  /2,  •  •  •  ,  fn} 

P12  =  {/l,  Mpi)>  =  {/i,  b2, . . . ,  /„} 

:  :  :  :  «l! 

Pin  =  {/l,  /2,  •  •  •  ,  i„(pi)}  =  {.A,  A,  •  •  •  ,  A} 

The  notation  py  refers  to  the  jth  spawned  vector  from  pi  in  List.  These  n  vectors 
comprise  the  SpawnData  data  structure.  Note  that  all  vectors  in  SpawnData  are 
non-dominated  with  respect  to  SpawnData. 

ndFilter:  This  routine  compares  the  vectors  in  List  to  those  in  SpawnData  and 
deletes  any  vectors  in  SpawnData  that  are  either  dominated  by  a  vector  in  List 
or  has  an  ro,  or  Mt  as  one  of  its  elements.  The  SpawnData  is  then  inserted  into 
List  and  replaces  pi. 

The  following  example  illustrates  this  algorithm’s  operation  on  List  which 
produces  the  following  SpawnData  data  structure  after  the  first  while  loop  is 
completed:11 


'2,2,2' 

1,3,1 

1,1,3 

'1,2,2' 

List  = 

SpawnData  = 

2,1,2 
2,  2,1 

3,1,1 

The  routine  initializes  Size  =  4.  The  first  for  loop  selects  the  vector  {2,2,2}, 
computes  bounds  for  each  vector  element  (1  in  this  case),  creates  SpawnData, 
and  calculates  the  incremental  hypervolume  based  on  the  dimensions  of  the 
lopped  off  hypercube,  1,1,1.  These  are  multiplied  together  to  create  the  partial 
sum  of  lopOSVol  =  1  which  is  added  to  LebMeasure.  Vector  {2,2,2}  is  then 
deleted  from  List. 

The  routine  ndFilter  deletes  any  vectors  in  SpawnData  dominated  by  the 
remaining  vectors  in  List  or  that  have  elements  equal  to  m-i  or  Mj  (in  that  case, 
the  edge  of  a  hypercube  would  be  of  zero  length).  The  resulting  SpawnData  is 
then  added  to  List  and  newSize  =  6.  List  and  its  SpawnData  are  now 

'1,2,2 
2,  1,  2 


-L,  U,  J. 

1,1,3 
.3,1,1 

Now  vector  {1, 2,  2}  will  be  deleted  from  List.  Although  List  is  larger,  eventually 
vectors  in  SpawnData  will  all  contain  m.i  and/or  be  dominated  by  vectors  in  List. 

10  To  simplify  the  expression,  we  will  use  fi  for  objective  function  values  /i(x i). 

11  Note  in  this  example  that  all  objective  functions  are  to  be  maximized. 


SpawnData  = 


0,  2,2 
1,1,2 
1,  2,  1 
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For  example,  all  of  the  SpawnData  vectors  above  will  be  deleted  since  the  first 
vector  has  an  element  at  the  lower  bound  (0),  and  the  second  and  third  vectors 
are  dominated  by  the  remaining  vectors  in  List.  List  thus  shrinks  from  6  vectors 
to  5  and  eventually  to  1  breaking  the  while  loop  and  ending  the  computation 
with  the  last  vector’s  hypervolume  being  added  to  LebMeasure. 

The  following  figures  depict  the  operation  of  this  algorithm.  Initially,  List 


a.  Initial  List.  b.  List  After  First  Iteration. 

Fig.  2.  Evolution  of  the  Hypervolume. 

corresponds  to  Figure  2a.  After  the  first  while  loop,  List  contains  6  vectors  cor¬ 
responding  to  Figure  2b  where  a  hypercube  with  dimensions  lxlxl  has  been 
“lopped  off” .  Its  volume  of  1  is  added  to  the  partial  sum  of  LebMeasure.  Con¬ 
tinuing  with  this  procedure  until  it  ends  yields  LebMeasure  =  11.  The  reader 
can  verify  the  result  by  starting  the  procedure  with  any  of  the  four  points  (i.e., 
the  order  of  vectors  in  List  makes  no  difference  to  the  value  of  LebMeasure ) . 


5.1  Complexity  of  Computing  LebMeasure 

Analyzing  the  long-run  behavior  of  LebMeasure  first  requires  an  assessment  of 
the  size  of  a  problem  instance.  Assuming  all  vectors  in  List  are  non-dominated, 
the  size  of  the  problem  instance  involves  m  x  n  values.  Including  the  bounds  for 
each  objective,  the  (m  +  l)n  numbers  are  sufficient  to  compute  the  hypervolume, 
hence  constitutes  the  size  of  the  problem  instance. 

The  time  complexity  of  the  algorithm  can  be  determined  by  first  observing, 
as  in  the  example,  that  the  size  of  List  may  grow  and  shrink  at  various  stages 
of  the  algorithm.  Thus,  the  key  to  analyzing  the  complexity  of  LebMeasure  is  in 
determining  the  manner  and  extent  to  which  spawned  vectors  are  added  to  and 
deleted  from  List.  Understanding  the  spawning  procedure  and  how  it  works  is 
therefore  critical  towards  understanding  the  complexity  of  LebMeasure.  Again, 
perusal  of  the  pseudocode  and  study  of  Figures  2a, b  should  help. 

Before  proceeding  further,  an  important  distinction  is  made  among  the  vec¬ 
tors  in  List — those  that  are  original  members  of  List ,  and  those  that  are  spawned 
from  these  original  members  of  List.  Those  vectors  having  only  one  subscript, 
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e.g.,  pi  are  the  original  members  of  List  while  those  with  more  than  one  sub¬ 
script,  e.g. .  pn  are  spawned  descendants  from  the  original  members. 

One  important  property  of  the  spawning  procedure  is  that  the  total  number 
of  spawned  vectors  in  List  can  never  exceed  n,  the  number  of  objective  functions. 
This  property  can  help  in  analyzing  the  worst-case  scenario  and  is  shown  in  the 
following  arguments.  To  gain  insight  into  the  developing  patterns,  the  first  few 
iterations  of  the  while  loop  are  closely  examined. 

Without  loss  of  generality,  assume  each  objective  is  to  be  maximized.  In  the 
first  iteration  of  the  while  loop,  and  for  the  worst-case  analysis,  assume  that 
all  the  spawned  vectors  of  pi  in  SpawnData  are  non-dominated  with  respect 
to  the  vectors  in  List.  In  this  case,  all  of  these  vectors  are  added  to  List  and 
Pi  is  removed.  These  spawned  vectors  are  indicated  in  (4).  The  length  of  List 
therefore  increases  from  m  to  at  most  m  +  n  —  1  and  evolves  thusly: 

Pi)  P2,  •  •  •  ,  Pm 

l 

Pll,  ■■  ■  i  Pin?  P2;  ■  ■  ■  ,  Pm 
requiring  the  following  computational  effort: 

Table  1.  Complexity  of  first  while  loop. 


Subroutine 

Complexity 

BoundVal 

(m  —  l)n 

Spawn 

n 

ndFiler 

( m  —  l)n 

Total 

2  (m  —  l)n  +  n 

Now  the  spawned  vector  pn  (see  (4))  will  itself  spawn  the  following  n  vectors: 

Pm  =  {  ii(pn),  fi,  ■  ■■,  fn  } 

P112  =  {  Mpi),  k(Pl)i  •••>  fn  } 

:  :  (5) 

Plln  =  {  MPl),  /2,  ...,Z„(pi)} 

Note  that  the  vector  pm,  contains  Zi(pn)  the  only  new  greatest  lower  bound 
which,  by  definition,  is  less  than  Zi (pi) ■  The  other  vectors  in  (5)  are  the  same  as 
those  in  (4)  except  that  the  f\  in  (4)  has  been  replaced  by  the  value  Zi(pi).  This 
means  that  the  vectors  in  (4)  dominate  those  in  (5) — i.e.,  for  all  i,  pu  dominates 
Pm.  But  because  pn  has  been  removed  from  List ,  no  point  currently  in  List 
necessarily  dominates  pm.  Consequently,  the  only  possible  point  in  the  pn* 
generation  (i.e.,  in  (5))  that  may  be  non-dominated  is  pm  which  for  purposes 
of  a  worst-case  analysis  is  added  to  List  replacing  pn.  The  size  of  List  therefore 
remains  at  m  +  n  —  1.  The  List  data  structure  evolves  thusly: 

Pll)Pl2,  •••  ,Plra?P2,  •••,  Pm 

I 

Pill)  Pl2?  •  •  •  j  Pin?  P2>  •••)  Pm 
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requiring  the  following  computational  effort: 

Table  2.  Complexity  of  while  loop  2  to  (m  —  1). 


Subroutine 

Complexity 

BoundVal 

(m-  1) 

Spawn 

n 

ndFiler 

(m  —  1  )n 

Total 

( m  —  1)  +  (m  —  l)n  +  n 

Now,  pm  is  at  the  top  of  List  and  may  spawn  another  set  of  vectors,  but 
again,  the  same  property  as  described  above  holds  and  may  yield  a  maximum  of 
only  one  non-dominated  pim  and  so  on.  Thus,  for  all  these  iterations,  the  size 
of  List  remains  at  m  +  n  —  1. 

Eventually,  some  descendant  of  pn,  spawns  vectors  of  the  form  pi...n  which 
all  become  dominated  by  vectors  in  List  or  contain  elements  at  the  lower  bound 
of  rrii  at  which  point  no  vectors  in  SpawnData  get  added  to  List  for  the  next 
iteration  and  the  last  of  all  the  descendants  of  pn  along  with  pn  are  removed 
from  List. 

The  question  arises  as  to  how  many  successive  generations  of  pn  are  possible 
in  the  worst-case.  Obviously,  it  cannot  be  greater  than  \List\  —  1.  Consequently, 
an  upper  bound  on  the  number  of  generations  that  the  first  spawned  vector  in 
List  may  spawn  is  m  —  1.  Thus,  the  complexity  of  while  loop  3  to  (m  —  1)  is  the 
same  as  given  in  Table  2. 

At  this  point  in  the  algorithm,  the  next  while  loop  starts  with  only  m  +  n  —  2 
vectors  P12,  . . . ,  pi„,  P2,  •  •  • ,  pm.  Now  the  spawned  vector  pi2  is  at  the  top 
of  List  and  spawns  its  descendants: 


Pl21  —  { 

MPl)>  Mpi)>  • 

•  •  ,  fn  } 

Pl22  =  { 

fi,  h{ P12),  • 

•  •  ,  fn  } 

Pl2n  =  { 

fi,  MPi)>  • 

■  ■  ,  L( Pi)  } 

and  P12  is  itself  removed  leaving  m  +  n  —  3  vectors  in  List  to  which  are  added 
the  non-dominated  vectors  in  SpawnData,  some  subset  of  (6).  Again,  the  same 
pattern  is  present.  That  is,  for  all  k,  pu-  dominates  vectors  pi2fc.  But  now  there 
are  only  k  —  2  vectors  from  the  first  spawned  set,  P13  . . .  pifc  left  in  List  to 
dominate  the  pi2i  generation.  Consequently,  it  is  possible  that  two  vectors,  P121 
and  P122 ,  may  be  non-dominated  with  respect  to  vectors  in  List.  The  size  of  List 
therefore  increases  from  m  +  n  —  2  back  to  at  most  m  +  n  —  3  +  2  =  m  +  n  —  1. 
List  becomes: 

Pl21,  Pl22,  Pl3)  •  •  •  )  Pin  5  P2,  •••,  Pm 
n  vectors  m—  1  vectors 

and  its  length  is  again  m  +  n  —  1.  The  following  general  observation  is  made: 

Each  time  one  of  the  first  spawned  vectors  (e.g.,  vectors  with  2  subscripts 
such  as  pi i)  is  removed  from  List,  the  successive  generations  of  the  remaining 
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vectors  of  the  form  pu,  add  vectors  to  List  in  sufficient  numbers  so  that  the 
length  of  List  remains  the  same. 

The  decrease  by  1  observed  earlier  happens  at  various  points,  but  for  purposes 
of  a  worst-case  analysis  can  be  ignored  (it  also  simplifies  the  analysis) .  Thus,  each 
of  the  first  n  spawned  vectors,  have  the  following  number  of  basic  computations 
where  the  length  of  List  is  m: 

[2  (to  —  l)n  +  n  +  (to  —  1)((to  —  1)  +  (to  —  1  )n  +  n)\n 

' - v - '  ' - v - ' 

first  iteration  iterations  2  to  m— 1 

which  is  of  order  m2n2.  After  these  calculations,  the  process  begins  with  the 
original  vector  P2  at  the  top  of  List  with  the  size  decremented.  Accounting  for 
this  decrease  in  the  length  of  List  we  have 

m—  1  ra— 1 

T(to,  n)  ~  ^2  ( m  —  i)2n2  =  n 2  ^  (to  —  i)2 
2—0  2=0 

2  (m(m+  1)(2to+  1)\ 

~ H  V  6  ) 

from  the  Sum  of  Squares  Formula  [17,  p.  199] .  Consequently,  the  time  complexity 
of  LebMeasure  is  T(m,n)  €  0(m3n2). 

6  Future  Research  and  Conclusion 

This  article  described  a  set  function  Ffxi,  X2, . . . ,  xm)  =  n(Dsm),  a  hypervolume 
on  a  point  set  Sm,  that  maps  the  arguments  to  a  scalar  and  achieves  its  maximum 
value  only  when  these  arguments  are  distinct  Pareto  optima.  Mapping  Pareto 
optima  to  a  scalar  unifies  the  concepts  of  single  and  multi-objective  optimization. 
A  polynomial  algorithm  for  calculating  this  hypervolume  was  also  described. 

Because  this  scalar  provides  necessary  and  sufficient  conditions  for  the  argu¬ 
ments  to  be  Pareto  optima,  it  is  also  the  best  measure  for  evaluating  different 
multi-objective  evolutionary  algorithms.  The  many  different  GAs  for  example 
can  all  be  evaluated  according  to  the  magnitude  of  this  scalar  quantity.  By  us¬ 
ing  a  scalar  quantity  to  evaluate  performance,  the  average  rate  of  convergence 
can  also  be  assessed,  hence,  the  performance  of  evolutionary  algorithms  quanti¬ 
fied  using  appropriate  statistics  to  estimate  the  average  hypervolume  over  many 
independent  trials. 

Finally,  this  result  shows  how  any  multi-objective  optimization  problem  can 
be  put  into  standard  math  programming  form  with  a  single  scalar  objective 
function.  As  such,  many  global  optimization  metaheuristics  can  be  recast  to 
solve  multi-objective  problems.  Simulated  annealing,  for  example,  and  its  paral¬ 
lel  variants  can  be  fashioned  to  converge  in  probability  to  Pareto  optima.  Future 
research  will  therefore  describe  the  relative  merits  of  using  parallel  versions  of 
SA  [18],  with  the  hypervolume  as  the  objective  function,  and  various  GA  imple¬ 
mentations. 
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Disclaimer 

The  views  and  conclusions  contained  in  this  document  are  those  of  the  author 
and  should  not  be  interpreted  as  representing  the  official  policies,  either  ex¬ 
pressed  or  implied,  of  the  Army  Research  Laboratory,  the  National  Aeronautics 
and  Space  Administration,  or  the  U.S.  Government. 
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A  Proofs  of  Lemmas 

This  appendix  contains  restatements  of  the  lemmas  used  in  the  text  followed  by 

their  proofs. 


A.l  Proof  of  Corollary  to  Lemma  1 

Proof.  From  Definition  2,  a  point  py  corresponding  to  solution  y  is  dominated 
by  solution  x  if  and  only  if  for  all  z, /)(x)  <  /,(y)  and  there  is  an  z  such  that 
/j(x)  <  /i(y).  The  solution  y  therefore  satisfies  all  the  criteria  for  inclusion  in 
set  Z?x.  Consequently,  the  statement  from  Lemma  1  that  px  £  S  is  dominated 
if  and  only  if  there  exists  a  py  £  S  such  that  px  £  Dy  is  true  and  taking  the 
inverse  of  this  statement  yields  the  required  result. 


A. 2  Proof  of  Lemma  2 

Proof.  From  Definition  4,  DA  =  \Jp.eADPi  =  {p  :  (3pj)(pi  £  A  Ape  DPi)} 
and  similarly  for  DB.  Therefore, 


Da  U  Dg 


U  Dpi  )  U  (  U  Dpi 

Pi6^4  /  \p,eB 

U  "Ppi  =  U  DPi  =  DAub 

p  PitAUB 


For  non-empty  A  n  B,  DA  fl  Db  e  {p  :  p;  g  4  A  p;  S  B  =>  p  S  Z?Pi}  hence  by 
the  distributive  property  we  have 


Da  fl  Db 


n{^) 


U  —  U  Dpi 

Pi£A/\B  pieAnB 

DAdb 


The  requirement  that  A  fl  B  0  stems  from  the  fact  that  for  all  sets  DA  and 
Db ,  Da  flBg  ^  0.  Consequently,  for  the  notational  virtues,  we  require  that  the 
point  sets  A  and  B  have  elements  in  common. 
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A. 3  Proof  of  Lemma  3 

Proof.  This  proof  relies  on  the  fact  non-dominated  points  px  have  the  property 
that  there  is  an  objective  function  /j(x)  such  that  it  is  strictly  less  than  the 
same  objective  function  evaluated  for  some  other  point  in  a  set  S.  Notice  that 
the  definition  of  Dx  in  Definition  5  above  relies  on  the  least  upper  bound  among 
all  the  other  points  in  set  S.  This  inequality  is  therefore  maintained  for  all  points 
in  Dx.  This  may  help  the  reader  see  that  each  point  p  £  Dx  is  non-dominated 
with  respect  to  the  other  points  in  set  S.  The  formal  proof  of  the  statement  is 
most  easily  appreciated  by  using  contradiction. 

Assume  set  Dx  is  defined  as  above  by  the  non-dominated  point  px  £  S 
and  that  one  of  its  elements  p'  =  /^)  £  Dx  is  dominated  by  a  point 

Ps  =  (/i(xs), . . . ,  /m(xs))  €  S.  From  Definition  2  of  non-dominance,  for  all  i, 

/i(xs)  <  /'  and  there  exists  an  i  such  that  ,  , 

<  //•  (  j 

Now  recall  that  px  is  a  non-dominated  point  with  respect  to  set  S.  From  Defini¬ 
tion  2  for  all  points  py  €  S  there  is  an  i  such  that  /»(x)  <  /,(y).  Let  i*  be  such 
an  i  for  point  ps.  Therefore,  /j.  (x)  <  /,.  (xs).  Thus,  /-,*  (xs)  is  an  upper  bound 
of  fo  (x).  From  the  definition  of  set  D'x ,  the  least  upper  bound  of  vector  element 
/'  in  vectors  in  set  Dx  is  itj(px).  Therefore,  /**(x)  <  ft,  <  m( px)  <  /,.(xs) 
contradicting  (7)  which  applies  to  all  i.  Consequently,  there  is  no  point  in  set 
S  \  px  that  dominates  point  p'  £  Dx.  Thus,  for  all  p  £  S  \  px,  p'  fL  Dp  and 
therefore  for  all  p  €  S  \  px,  Dx  ft  Dp  =  0. 

Proof  of  the  Corollary  to  Lemma  3 

Proof.  From  the  definition  of  Dx  and  the  fact  that  we  are  concerned  with  discrete 
optimization  problems,  each  interval  in  the  multi-interval  that  defines  Dx  has  a 
length  Uj(px)  -  fi (x)  >  0,  hence,  n(D'x)  >  0. 

A. 4  Proof  of  Lemma  4 

Lemma  4:  Given  points  px,  py  €  S,  px  dominates  py  if  and  only  if  Dy  C  Dx. 

Proof.  Recall  Definition  2  which  describes  conditions  whereby  a  point  px  is  non- 
dominated  if  and  only  if  for  all  py  <G  S  there  exists  an  i  such  that  /,(x)  <  ft(y). 
Its  negation  therefore  implies  that  a  point  py  is  dominated  if  and  only  if  there 
exists  some  feasible  point  px  such  that  for  all  i,  /,  (x)  <  /,( y)  and  there  exists 
an  i  such  that  /i(x)  <  /,(y).  From  Definition  3,  the  elements  of  set  Dx  are  such 
that  for  all  i  the  following  inequalities  hold  for  each  /j  :  /,(x)  <  /,  <  Mt .  But 
elements  of  set  Dy  are  such  that  /,( y)  <  fi  <  Mt .  Since  for  all  i,  /,;(x)  <  ft( y), 
each  element  of  Dy  also  satisfies  the  criteria  for  inclusion  in  set  Dx.  Thus,  p  £ 
Dy  =>  j)  G  Dx,  hence  Dy  C  Dx. 

To  prove  that  Dy  C  Dx  =>  px  dominates  py,  note  that  when  all  p  £  Dy  =t> 
P  £  Dx  from  Definition  3  all  such  points  are  dominated  by  px.  It  remains  to  show 
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that  py  is  also  dominated  by  px  (note  from  Definition  3  points  px  ^  Dx  and 
py  ^  Dy).  From  the  definitions  of  Dy  and  Dx  each  multi- interval  associated  with 
Dy  is  and  for  Dx  is  [/j(x),  Mf\.  Since  Dy  C  Dx  it  must  be  that  each 

multi-interval  associated  with  y  is  a  subset  of  the  corresponding  multi-interval 
for  x,  hence  for  all  i,  /,; (x)  <  Since  this  is  a  discrete  problem  domain  and 

that  px  7^  py  there  must  be  some  6  >  0  where  for  some  i,  /i(x)  +  <5  <  /*( y), 
hence  for  some  i,  /)(x)  <  Therefore,  from  Definition  2,  px  dominates  py. 


Proof  of  the  Corollary  to  Lemma  4 

Proof.  It  is  a  basic  result  of  measure  theory  that  for  any  two  measurable  sets  A 
and  B ,  /, i(A  U  B)  =  p(A)  +  p(B)  —  / j.(A  ft  B )  (lattice  property).  Therefore  it 
follows  that 

M  (Ac  U  Dy)  =  n  (Ac)  +  n  (Dy)  -  p(Dx  n  Dy).  (8) 

But  if  px  dominates  py  then  from  Lemma  4,  Dy  C  Dx  and  it  follows  that 
Dx  n  Dy  =  Dy.  Consequently,  /i  (Dx  n  Dy)  =  //  (Dy)  and  using  this  in  (8)  yields 
the  equality. 

To  show  that  p(Dx)  >  n(Dy)  it  is  sufficient  to  show  that  at  least  one  of 
the  factors  in  /.i(Dx)  is  greater  than  the  corresponding  factor  in  /x(£)y).  Since 
px  dominates  py  then  for  all  i,  fi  (x)  <  fi(y)  and  there  is  at  least  one  i  where 
/i(x)  <  fi( y).  Consequently,  for  all  i,  Mi  —  /i(x)  >  —  fi(y)  and  at  least  for 

one  i,  Mi  -  /,(x)  >  Mt-  f{( y).  Therefore, 

k  k 

n  (Mi  -  /i(x))  >  n  (Mi  -  My)) , 

i-I  i=l 

hence  fi  (Dx)  >  fi  (Dy). 

B  Proofs  of  Theorems 

B.l  Proof  of  Theorem  1 

Proof.  Note  that  Case  2  provides  a  stronger  result  as  both  the  necessary  and 
sufficient  conditions  for  Pareto  optimality  are  proved.  Therefore,  for  the  sake 
of  simplicity,  clarity,  and  brevity,  we  formally  prove  only  Case  2  as  Case  1  be¬ 
comes  sufficiently  obvious  from  the  details  of  proving  Case  2.  First  we  prove  the 
implication  (somewhat  more  informally  stated)  that 

Case  2  (to  >  p):  If  F  is  at  its  maximum  value  then  there  exists  a  subset  of 
size  p  of  its  arguments  that  are  Pareto  optimal  and  different  from  one  another. 
Using  contraposition,  we  prove  the  following  equivalent  statement: 

If  for  all  possible  subsets  S'  C  S  of  size  p  where  either  not  all  points  p  £  S'  are 
Pareto  optimal  or  there  exists  pk ,  Pz  €E  S'  where  k  ^  l  and  p k  =  Pz ,  then  F  is 
not  maximized. 
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Assume  all  subsets  of  S  of  size  p  have  less  than  p  distinct  Pareto  optimal 
points.  Then  all  subsets  can  have  at  most  p  —  1  distinct  Pareto  optimal  points. 
Since  there  are  p  Pareto  optimal  points  in  the  problem  instance,  then  there  exists 
at  least  one  Pareto  optimal  point  px  that  is  not  currently  in  set  S.  Further,  there 
are  at  least  m  —  p  + 1  >  0  elements  in  S  that  are  not  Pareto  optimal.  Also,  bear 
in  mind  that  some  or  even  all  of  the  points  in  set  S  may  be  non-dominated. 
Thus,  there  are  a  number  of  different  ways  in  which  subsets  do  not  have  p 
Pareto  optimal  points  that  must  all  be  carefully  considered.  Finally,  recall  from 
Definition  1  that  the  p  Pareto  optimal  points  dominate  all  other  feasible  points. 
Consider  the  following  cases: 

Case  2a:  First,  suppose  that  S  has  p  Pareto  optimal  points,  but  that  at  least 
one  of  them  has  a  multiplicity  greater  than  one  ( i .  e.,  at  most  p —  1  distinct  Pareto 
optima).  In  this  case,  removing  one  such  Pareto  optimal  point  p'  does  not  change 
the  measure  of  set  S.  Consequently,  DS\P'  =  Ds  and  therefore,  adding  point  px 
to  S  \  p'  yields  the  same  number  of  arguments  for  function  F  and 

D  S\p'Upx  -^SUpx- 


But  from  Lemma  2 


-Dsupx  =  Ds  U  DPx  =  Ds  U  ( DPx  \  Ds) 
which  are  mutually  exclusive,  hence 

M-DsupJ  =  MAs)  +  tl{DPx  \  Ds) 

From  the  corollary  of  Lemma  3,  p(DPx\Ds)  >  0  and  therefore  p(Ds)  +  p(DPx  \ 
Ds)  >  h(Ds)  and  F  with  its  original  arguments  is  not  maximized. 

Case  2b:  Now  consider  the  case  where  there  are  less  than  p  Pareto  optimal 
points.  In  this  case,  there  are  m  —  p  +  1  >  0  non-Pareto  optimal  points  in  S. 
Choose  one  such  non-Pareto  optimal  point  py  €  S.  Since  there  exists  p  Pareto 
optimal  points  that  dominate  all  other  feasible  points,  py  is  either  dominated 
by  a  Pareto  optimal  point  already  in  S  or  it  is  dominated  by  the  Pareto  optimal 
point  px.  In  the  former  instance,  we  have  the  same  situation  as  in  Case  2a,  i.e., 
removing  py  from  S  does  not  change  its  measure  and  adding  px  increases  the 
measure  and  again,  F  with  its  original  arguments  is  not  maximized. 

In  the  latter  case  where  py  is  dominated  by  px,  define  set  S  =  S  \  py 
and  set  S*  =  SUpx  (i.e.,  by  substituting  py  with  px  in  S).  Thus,  sets  S 
and  S*  have  the  same  number  of  elements  m,  hence  F  has  the  same  number  of 
arguments.  It  is  therefore  sufficient  to  prove  that  F  is  not  maximized  by  showing 
that  p(Ds*)  >  p(F>s)  or,  equivalently,  that 

p(Dg  U  Dx)  >  p(Dg  U  Dy)  (9) 

Partition  set  Z?x  \  Dg  into  two  mutually  exclusive  subsets: 

(Ac  \  Dg)  =  ((Ac  \  Dg)  \  Dy)  U  ((A  \  Dg)  C  Dy). 


(10) 
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Since  px  dominates  py,  from  Lemma  4,  Dy  C  Ac,  hence  the  set  Dx  exclusive 
of  points  in  Dg  but  with  points  in  Dy ,  is  equivalent  to  the  set  of  points  in  Dy 
exclusive  of  points  in  Dg,  i.e., 

(Ac  \  Dg)  fl  Dy)  =  {Dy  \  D  g)  (ll) 

Substituting  (11)  into  (10)  then 

(Ac  \  Dg)  =  ((Ac  \  Dg)  \  Dy)  U  (Dy  \  DS).  (12) 

Because  this  is  a  union  of  two  mutually  exclusive  sets  then, 

MAc  \  Ag)  =  ^((DX  \  Dg )  \  Dy)  +  f-L  (Dy  \  D  g)  .  (13) 

From  the  Corollary  to  Lemma  3,  the  measure  of  the  set  Dx  exclusive  of  all 
other  points  in  D$'  is  strictly  greater  than  zero.  Consequently,  in  (13)  the  term 
H((DX\Dg)\Dy)  >  0  and  (13)  reduces  to  the  inequality 

MAc  \  Ag)  >  ^(Dy  \  Dg).  (14) 

Note  however  that  Dx  and  Dy  can  be  partitioned  into  two  mutually  exclusive 
sets.  Thus, 


A  =  (Ax  \  Ag)  u  (Ax  n  Dg) 

Dy  =  (Dy  \  Dg)  U  (Dy  fl  Dg) 
hence  their  measures  are  such  that 


M(Ac)  =  V(DX  \  Dg)  +  H(DX  I~1  Dg) 
MAO  =  fi(Dy  \  Dg)  +  /i(Dy  D  D  g)  . 


Consequently, 


n(Dx  \  Dg)  =  n(Dx)  -  n(Dx  CiDg)  (15) 

A*  (A; y  \  Dg)  =  fi(Dy)  —  l-l(Dy  fl  D  g)  (16) 

Substituting  (15)  and  (16)  into  (14)  we  obtain 

MAx)  ~  A*(Ac  n  Dg)  >  n(Dy)  —  /i(Hy  n  Dg)  (17) 

and  adding  n(D§)  to  both  sides  of  (17)  yields 

MAs )  At(AX)  _  H(DX  C  Dg)  >  l^l(Dg)  +  [l(Dy)  —  fJ.(Dy  (~l  Dg )  . 

From  the  lattice  property  of  measurable  sets  we  obtain  (9),  i.e.,  / i(D§  U  Ac)  > 
l-i(Dg  U  Dy).  Therefore  the  function  F  with  its  original  arguments  is  not  at  its 
maximum  value. 

To  prove  the  inverse  statement  associated  with  Case  2,  i.e.,  that  if  all  points 
p  £  S  are  Pareto  optimal  and  k  yA  l  =>  pfc  p;  then  F  attains  its  maximum 
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value  can  be  proved  by  contradiction.  Assume  S  has  a  subset  S'  of  size  p  that 
are  all  distinct  Pareto  optimal  points  and  F  is  not  at  its  maximum  value.  In  this 
case,  the  measure  n{Ds)  =  p(Ds').  To  increase  the  value  of  F,  the  measure  of 
some  subset  DPi  C  Ds>  must  be  increased.  Suppose  we  increase  it  by  selecting 
a  feasible  point  p*  which  contains  hence  enlarges  set  DPi.  That  is,  Dp*  D  DPi. 
From  Lemma  4  then  Dp *  dominates  DPi  leading  to  the  contradiction  that  p,  is 
Pareto  optimal. 


